50 research outputs found
Clustering in Hypergraphs to Minimize Average Edge Service Time
We study the problem of clustering the vertices of a weighted hypergraph such that on average the vertices of each edge can be covered by a small number of clusters. This problem has many applications such as for designing medical tests, clustering files on disk servers, and placing network services on servers. The edges of the hypergraph model groups of items that are likely to be needed together, and the optimization criteria which we use can be interpreted as the average delay (or cost) to serve the items of a typical edge. We describe and analyze algorithms for this problem for the case in which the clusters have to be disjoint and for the case where clusters can overlap. The analysis is often subtle and reveals interesting structure and invariants that one can utilize
Codes for Load Balancing in TCAMs: Size Analysis
Traffic splitting is a required functionality in networks, for example for
load balancing over paths or servers, or by the source's access restrictions.
The capacities of the servers (or the number of users with particular access
restrictions) determine the sizes of the parts into which traffic should be
split. A recent approach implements traffic splitting within the ternary
content addressable memory (TCAM), which is often available in switches. It is
important to reduce the amount of memory allocated for this task since TCAMs
are power consuming and are often also required for other tasks such as
classification and routing. Recent works suggested algorithms to compute a
smallest implementation of a given partition in the longest prefix match (LPM)
model. In this paper we analyze properties of such minimal representations and
prove lower and upper bounds on their size. The upper bounds hold for general
TCAMs, and we also prove an additional lower-bound for general TCAMs. We also
analyze the expected size of a representation, for uniformly random ordered
partitions. We show that the expected representation size of a random partition
is at least half the size for the worst-case partition, and is linear in the
number of parts and in the logarithm of the size of the address space
Efficient Measurement on Programmable Switches Using Probabilistic Recirculation
Programmable network switches promise flexibility and high throughput,
enabling applications such as load balancing and traffic engineering. Network
measurement is a fundamental building block for such applications, including
tasks such as the identification of heavy hitters (largest flows) or the
detection of traffic changes.
However, high-throughput packet processing architectures place certain
limitations on the programming model, such as restricted branching, limited
capability for memory access, and a limited number of processing stages. These
limitations restrict the types of measurement algorithms that can run on
programmable switches. In this paper, we focus on the RMT programmable
high-throughput switch architecture, and carefully examine its constraints on
designing measurement algorithms. We demonstrate our findings while solving the
heavy hitter problem.
We introduce PRECISION, an algorithm that uses \emph{Probabilistic
Recirculation} to find top flows on a programmable switch. By recirculating a
small fraction of packets, PRECISION simplifies the access to stateful memory
to conform with RMT limitations and achieves higher accuracy than previous
heavy hitter detection algorithms that avoid recirculation. We also analyze the
effect of each architectural constraint on the measurement accuracy and provide
insights for measurement algorithm designers.Comment: To appear in IEEE ICNP 201
Avoiding Flow Size Overestimation in the Count-Min Sketch with Bloom Filter Constructions
The Count-Min sketch is the most popular data structure for flow size estimation, a basic measurement task required in many networks. Typically the number of potential flows is large, eliminating the possibility to maintain a counter per flow within memory of high access rate. The Count-Min sketch is probabilistic and relies on mapping each flow to multiple counters through hashing. This implies potential estimation error such that the size of a flow is overestimated when all flow counters are shared with other flows with observed traffic. Although the error in the estimation can be probabilistically bounded, many applications can benefit from accurate flow size estimation and the guarantee to completely avoid overestimation. We describe a design of the Count-Min sketch with accurate estimations whenever the number of flows with observed traffic follows a known bound, regardless of the identity of these particular flows. We make use of a concept of Bloom filters that avoid false positives and indicate the limitations of existing Bloom filter designs towards accurate size estimation. We suggest new Bloom filter constructions that allow scalability with the support for a larger number of flows and explain how these can imply the unique guarantee of accurate flow size estimation in the well known Count-Min sketch.Ori Rottenstreich was partially supported by the German-Israeli Foundation for Scientic Research and Development (GIF), by the Gordon Fund for System Engineering as well as by the Technion Hiroshi Fujiwara Cyber Security Research Center and the Israel National Cyber Directorate. Pedro Reviriego would like to acknowledge the sup-port of the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Ministry of Science and Innovation and of the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496
Invertible Bloom Lookup Tables with Listing Guarantees
The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data
structure for set representation that supports a listing operation as the
recovery of the elements in the represented set. Its applications can be found
in network synchronization and traffic monitoring as well as in
error-correction codes. IBLT can list its elements with probability affected by
the size of the allocated memory and the size of the represented set, such that
it can fail with small probability even for relatively small sets. While
previous works only studied the failure probability of IBLT, this work
initiates the worst case analysis of IBLT that guarantees successful listing
for all sets of a certain size. The worst case study is important since the
failure of IBLT imposes high overhead. We describe a novel approach that
guarantees successful listing when the set satisfies a tunable upper bound on
its size. To allow that, we develop multiple constructions that are based on
various coding techniques such as stopping sets and the stopping redundancy of
error-correcting codes, Steiner systems, and covering arrays as well as new
methodologies we develop. We analyze the sizes of IBLTs with listing guarantees
obtained by the various methods as well as their mapping memory consumption.
Lastly, we study lower bounds on the achievable sizes of IBLT with listing
guarantees and verify the results in the paper by simulations
Adaptive one memory access bloom filters
Bloom filters are widely used to perform fast approximate membership checking in networking applications. The main limitation of Bloom filters is that they suffer from false positives that can only be reduced by using more memory. We suggest to take advantage of a common repetition in the identity of queried elements to adapt Bloom filters for avoiding false positives for elements that repeat upon queries. In this paper, one memory access Bloom filters are used to design an adaptation scheme that can effectively remove false positives while completing all queries in a single memory access. The proposed filters are well suited for scenarios on which the number of memory bits per element is low and thus complement existing adaptive cuckoo filters that are not efficient in that case. The evaluation results using packet traces show that the proposed adaptive Bloom filters can significantly reduce the false positive rate in networking applications with the single memory access. In particular, when using as few as four bits per element, false positive rates below 5% are achieved.This work was supported by the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Agencia Estatal de Investigación (AEI) 10.13039/501100011033 and by the Madrid Community research project TAPIR-CM grant no. P2018/TCS-4496